Skip to content

Conversation

HaloKim
Copy link

@HaloKim HaloKim commented Sep 25, 2025

Add support for Qwen3 MoE conversion.

Modified files:

  • mergekit/architecture/moe_defs.py
  • mergekit/moe/init.py
  • mergekit/moe/qwen.py
  • mergekit/moe/qwen3.py

Copy link

github-actions bot commented Sep 25, 2025


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


1 out of 2 committers have signed the CLA.
✅ (HaloKim)[https://github.com/HaloKim]
❌ @dev7halo
dev7halo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

cursor[bot]

This comment was marked as outdated.

@HaloKim
Copy link
Author

HaloKim commented Sep 25, 2025

I have read the CLA Document and I hereby sign the CLA

@HaloKim
Copy link
Author

HaloKim commented Sep 25, 2025

recheck

cursor[bot]

This comment was marked as outdated.

Comment on lines +91 to +93

# Expert weights 추가
for expert_idx in range(num_experts):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter order in the expert loops is inconsistent between implementations. In Qwen3MoeModuleArchitecture, the order is up_proj, gate_proj, down_proj, while in KORMoMoeModuleArchitecture it's gate_proj, up_proj, down_proj. This inconsistency should be standardized to ensure correct weight processing across different model architectures. Consider aligning the parameter order in both implementations to maintain consistency throughout the codebase.

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

cursor[bot]

This comment was marked as outdated.

tqdm.tqdm(router_weights, desc="Router weights")
):
writer.save_tensor(
f"model.layers.{layer_idx}.mlp.gate.linear.weight",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Router Gate Weight Naming Mismatch

The KORMoMoeModuleArchitecture expects router gate weights to be named mlp.gate.weight, but the KORMoMoE saving logic adds a .linear suffix, resulting in mlp.gate.linear.weight. This naming inconsistency prevents the model from loading correctly.

Additional Locations (1)

Fix in Cursor Fix in Web

tqdm.tqdm(router_weights, desc="Router weights")
):
writer.save_tensor(
f"model.layers.{layer_idx}.mlp.gate.linear.weight",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tensor path in kormo.py should be model.layers.{layer_idx}.mlp.gate.weight rather than model.layers.{layer_idx}.mlp.gate.linear.weight to maintain consistency with the MoEGate implementation in the modeling file. The current path would cause the router weights to be saved at an incorrect location, making them inaccessible to the model during loading.

Suggested change
f"model.layers.{layer_idx}.mlp.gate.linear.weight",
f"model.layers.{layer_idx}.mlp.gate.weight",

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant